17 research outputs found
Identifying overlapping terrorist cells from the Noordin Top actor-event network
Actor-event data are common in sociological settings, whereby one registers
the pattern of attendance of a group of social actors to a number of events. We
focus on 79 members of the Noordin Top terrorist network, who were monitored
attending 45 events. The attendance or non-attendance of the terrorist to
events defines the social fabric, such as group coherence and social
communities. The aim of the analysis of such data is to learn about the
affiliation structure. Actor-event data is often transformed to actor-actor
data in order to be further analysed by network models, such as stochastic
block models. This transformation and such analyses lead to a natural loss of
information, particularly when one is interested in identifying, possibly
overlapping, subgroups or communities of actors on the basis of their
attendances to events. In this paper we propose an actor-event model for
overlapping communities of terrorists, which simplifies interpretation of the
network. We propose a mixture model with overlapping clusters for the analysis
of the binary actor-event network data, called {\tt manet}, and develop a
Bayesian procedure for inference. After a simulation study, we show how this
analysis of the terrorist network has clear interpretative advantages over the
more traditional approaches of affiliation network analysis.Comment: 24 pages, 5 figures; related R package (manet) available on CRA
Mixtures of multivariate generalized linear models with overlapping clusters
With the advent of ubiquitous monitoring and measurement protocols, studies
have started to focus more and more on complex, multivariate and heterogeneous
datasets. In such studies, multivariate response variables are drawn from a
heterogeneous population often in the presence of additional covariate
information. In order to deal with this intrinsic heterogeneity, regression
analyses have to be clustered for different groups of units. Up until now,
mixture model approaches assigned units to distinct and non-overlapping groups.
However, not rarely these units exhibit more complex organization and
clustering. It is our aim to define a mixture of generalized linear models with
overlapping clusters of units. This involves crucially an overlap function,
that maps the coefficients of the parent clusters into the the coefficient of
the multiple allocation units. We present a computationally efficient MCMC
scheme that samples the posterior distribution of the parameters in the model.
An example on a two-mode network study shows details of the implementation in
the case of a multivariate probit regression setting. A simulation study shows
the overall performance of the method, whereas an illustration of the voting
behaviour on the US supreme court shows how the 9 justices split in two
overlapping sets of justices.Comment: 24 pages, 3 figure
Semiparametric finite mixture of regression models with Bayesian P-splines
A semiparametric finite mixture of regression models is defined, with concomitant information assumed to influence both the component weights and the conditional means. The contribution of a concomitant variable is flexibly specified as a smooth function represented by cubic splines. A Bayesian estimation procedure is proposed and an empirical analysis of the baseball salaries dataset is illustrated
Bayesian variable selection in linear regression models with non-normal errors
This paper addresses two crucial issues in multiple linear regression analysis: (i) error terms whose distribution is non-normal because of the presence of asymmetry of the response variable and/or data coming from heterogeneous populations; (ii) selection of the regressors that effectively contribute to explaining patterns in the observations and are relevant for predicting the dependent variable. A solution to the first issue can be obtained through an approach in which the distribution of the error terms is modelled using a finite mixture of Gaussian distributions. In this paper we use this approach to specify a Bayesian linear regression model with non-normal errors; furthermore, by embedding Bayesian variable selection techniques in the specification of the model, we simultaneously perform estimation and variable selection. These tasks are accomplished by sampling from the posterior distributions associated with the model. The performances of the proposed methodology are evaluated through the analysis of simulated datasets in comparison with other approaches. The results of an analysis based on a real dataset are also provided. The methods developed in this paper result to perform well when the distribution of the error terms is characterised by heavy tails, skewness and/or multimodality
Bayesian variable selection in linear regression models with non-normal errors
Multiple linear regression is a prime statistical tool used to discover potential
relationships between an outcome and some explanatory variables of interest.
One of the common required assumptions is for the error terms in the model to be
Gaussian. Instead of assuming normality, an alternative is to use a finite mixture of
normal distributions, allowing for a more flexible definition of the heterogeneity structure
of the data. We use this approach to develop a Bayesian linear regression model
with non-normal errors, and through variable selection we focus on finding active
predictors effectively contributing to explaining patterns in the observations
Bayesian variable selection in linear regression models with non-normal errors
This paper addresses two crucial issues in multiple linear regression analysis: (i) error terms whose distribution is non-normal because of the presence of asymmetry of the response variable and/or data coming from heterogeneous populations; (ii) selection of the regressors that effectively contribute to explaining patterns in the observations and are relevant for predicting the dependent variable. A solution to the first issue can be obtained through an approach in which the distribution of the error terms is modelled using a finite mixture of Gaussian distributions. In this paper we use this approach to specify a Bayesian linear regression model with non-normal errors; furthermore, by embedding Bayesian variable selection techniques in the specification of the model, we simultaneously perform estimation and variable selection. These tasks are accomplished by sampling from the posterior distributions associated with the model. The performances of the proposed methodology are evaluated through the analysis of simulated datasets in comparison with other approaches. The results of an analysis based on a real dataset are also provided. The methods developed in this paper result to perform well when the distribution of the error terms is characterised by heavy tails, skewness and/or multimodality
Fused graphical lasso for brain networks with symmetries
Neuroimaging is the growing area of neuroscience devoted to produce data with the goal of capturing processes and dynamics of the human brain. We consider the problem of inferring the brain connectivity network from time- dependent functional magnetic resonance imaging (fMRI) scans. To this aim we propose the symmetric graphical lasso, a penalized likelihood method with a fused type penalty function that takes into explicit account the natural symmetrical structure of the brain. Symmetric graphical lasso allows one to learn simultaneously both the network structure and a set of symmetries across the two hemispheres. We implement an alternating directions method of multipliers algorithm to solve the corresponding convex optimization problem. Furthermore, we apply our methods to estimate the brain networks of two subjects, one healthy and one affected by mental disorder, and to compare them with respect to their symmetric structure. The method applies once the temporal dependence characterizing fMRI data have been accounted for and we compare the impact on the analysis of different detrending techniques on the estimated brain networks. Although we focus on brain networks, symmetric graphical lasso is a tool which can be more generally applied to learn multiple networks in a context of dependent samples